Back to Article
Article Notebook
Download Source

Diaptera wings classification using Topological Data Analysis

Authors
Affiliation

Guilherme Vituri F. Pinto

Universidade Estadual Paulista

Sergio Ura

Northon

Published

December 5, 2025

Abstract

We studied etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc etc

Keywords

Topological Data Analysis, Persistent homology

In [1]:
using TDAfly, TDAfly.Preprocessing, TDAfly.TDA, TDAfly.Analysis
using Images: mosaicview
using Plots: plot, display, heatmap, scatter
using PersistenceDiagrams

1 Introduction

Falar sobre o dataset, TDA, etc.

2 Methods

All images are in the images/processed directory. For each image, we load it, apply a gaussian blur, crop and make it have 150 pixels of height. The blurring step is necessary to “glue” small holes in the figure and keep it connected.

In [1]:
paths = readdir("images/processed", join = true)
species = basename.(paths) .|> (x -> replace(x, ".png" => ""))
individuals = map(species) do specie
  s = split(specie, " ")
  s[1][1] * "-" * s[2]
end
wings = load_wing.(paths, blur = 1.3)
Xs = map(image_to_r2, wings);
In [1]:
mosaicview(wings, ncol = 4, fillvalue=1)

2.1 Vietoris-Rips filtration

We select 500 points from each image using a farthest point sample method

In [1]:
samples = map(Xs) do X
  ids = farthest_points_sample(X, 500)
  X[ids]
end;
Progress:   0%|█                                        |  ETA: 0:01:28
Progress: 100%|█████████████████████████████████████████| Time: 0:00:01

Progress:  40%|█████████████████                        |  ETA: 0:00:00
Progress:  85%|████████████████████████████████████     |  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00

Progress:  27%|████████████                             |  ETA: 0:00:00
Progress:  59%|████████████████████████                 |  ETA: 0:00:00
Progress:  90%|█████████████████████████████████████    |  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00

Progress:  27%|███████████                              |  ETA: 0:00:00
Progress:  57%|████████████████████████                 |  ETA: 0:00:00
Progress:  86%|████████████████████████████████████     |  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00

Progress:  32%|██████████████                           |  ETA: 0:00:00
Progress:  65%|███████████████████████████              |  ETA: 0:00:00
Progress:  96%|████████████████████████████████████████ |  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00

Progress:  40%|█████████████████                        |  ETA: 0:00:00
Progress:  77%|████████████████████████████████         |  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00

Progress:  29%|████████████                             |  ETA: 0:00:00
Progress:  60%|█████████████████████████                |  ETA: 0:00:00
Progress:  91%|██████████████████████████████████████   |  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00

Progress:  37%|████████████████                         |  ETA: 0:00:00
Progress:  73%|██████████████████████████████           |  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00

Progress:  42%|██████████████████                       |  ETA: 0:00:00
Progress:  84%|███████████████████████████████████      |  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00

Progress:  42%|██████████████████                       |  ETA: 0:00:00
Progress:  81%|██████████████████████████████████       |  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00

Progress:  39%|█████████████████                        |  ETA: 0:00:00
Progress:  85%|███████████████████████████████████      |  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00

Progress:  46%|███████████████████                      |  ETA: 0:00:00
Progress:  92%|██████████████████████████████████████   |  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00

Progress:  51%|█████████████████████                    |  ETA: 0:00:00
Progress:  99%|█████████████████████████████████████████|  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00

Progress:  63%|██████████████████████████               |  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00

Progress:  53%|██████████████████████                   |  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00

Progress:  80%|█████████████████████████████████        |  ETA: 0:00:00
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00

and create an empty dictionary to store all computations

In [1]:
simple_rips_dc = Dict();

We then calculate its persistence diagrams using the Vietoris-Rips filtration etc.

In [1]:
# get only the 1-dimensional PD
simple_rips_dc["PD"] = rips_pd.(samples, cutoff = 5, threshold = 200) .|> last;

We create the 1-dimensional persistence image for each persistence diagram using 10x10 matrices

In [1]:
PI = PersistenceImage(simple_rips_dc["PD"], size = (10, 10))

simple_rips_dc["PI"] = PI.(simple_rips_dc["PD"]);

2.1.1 Examples

Below are some examples of 1-dimensional barcodes, its persistence image and the original wing that generated it. Note: we are plotting the barcode using the birth and persistence.

In [1]:
# plot some images to see the barcodes
map([1, 4, 8, 10, 15]) do i
  p = plot_wing_with_pd(simple_rips_dc["PD"][i], simple_rips_dc["PI"][i], samples[i], species[i])
  display(p)
end;

We now calculate the Euclidean distance between each persistence image (seen as a vector of \(\mathbb{R}^{10x10}\)) and plot its heatmap

In [1]:
simple_rips_dc["Distance_matrix"] = pairwise_distance(simple_rips_dc["PI"]);
In [1]:
plot_heatmap(
  simple_rips_dc["Distance_matrix"], 
  individuals, 
  "Distance matrix for Vietoris-Rips barcodes"
)

2.2 Persistence Homology Transform

Now we will create several filtrations based on points and lines, etc.

We start with the point (0, 0). Its filtration is the following

In [1]:
A = wings[10] |> image_to_array;
f = dist_to_point(0, 0)
Af = modify_array(A, f)
heatmap(Af)

with corresponding sublevel barcode as

In [1]:
point_pds = cubical_pd(Af, cutoff = 0.05)
plot_pd(point_pds)

or, with persistence in the y-axis:

In [1]:
plot_pd(point_pds, persistence = true)

Let’s see step-by-step of this filtration:

In [1]:
for tr  reverse([0:0.1:1;])
  X = findall_ids(>(tr), Af)
  title = "threshold: $tr"
  p = scatter(first.(X), last.(X), title = title)
  display(p)
end

Due to noise, some connected components are born in 0.2 and die only at 0. But the loops seems alright.

Let’s apply this idea to all images!

In [1]:
point_rips_dc = Dict()
f = dist_to_point(0, 0)

point_rips_dc["PD"] = map(wings) do wing
  A = wing |> image_to_array;
  Af = modify_array(A, f)
  point_pds = cubical_pd(Af, cutoff = 0.05)[2]
end;

PI = PersistenceImage(point_rips_dc["PD"], size = (10, 10))
point_rips_dc["PI"] = PI.(point_rips_dc["PD"]);
In [1]:
final_dc = Dict();

dics = [simple_rips_dc, point_rips_dc]

final_dc["PI"] = simple_rips_dc["PI"] + point_rips_dc["PI"];
In [1]:
final_dc["Distance_matrix"] = pairwise_distance(final_dc["PI"]);
In [1]:
plot_heatmap(
  final_dc["Distance_matrix"], 
  individuals, 
  "Distance matrix for Vietoris-Rips + point-based barcodes"
)

What happened?! Some wing has a big distance from everything else!

In [1]:
Xs = point_rips_dc["PI"]
id = findfirst(==("c-39"), individuals)
# id = 9
X = Xs[id]
heatmap(X)
In [1]:
wings[id]
In [1]:
heatmap(wings[id])

The reason is that it has one small cycle right at the beggining of the filtration…

3 Draft…..

In [1]:
ps = map(reverse([0:0.1:0.9;])) do tr
  X = findall_ids(>(tr), Af)
  title = "threshold: $tr"
  p = scatter(first.(X), last.(X), title = title)
  # display(p)
end

plot(ps...)